Associations between genetically predicted plasma protein levels and Alzheimer’s disease risk: a study using genetic prediction models

Background Specific peripheral proteins have been implicated to play an important role in the development of Alzheimer’s disease (AD). However, the roles of additional novel protein biomarkers in AD etiology remains elusive. The availability of large-scale AD GWAS and plasma proteomic data provide the resources needed for the identification of causally relevant circulating proteins that may serve as risk factors for AD and potential therapeutic targets. Methods We established and validated genetic prediction models for protein levels in plasma as instruments to investigate the associations between genetically predicted protein levels and AD risk. We studied 71,880 (proxy) cases and 383,378 (proxy) controls of European descent. Results We identified 69 proteins with genetically predicted concentrations showing associations with AD risk. The drugs almitrine and ciclopirox targeting ATP1A1 were suggested to have a potential for being repositioned for AD treatment. Conclusions Our study provides additional insights into the underlying mechanisms of AD and potential therapeutic strategies. Supplementary Information The online version contains supplementary material available at 10.1186/s13195-023-01378-4.


Introduction
Alzheimer's disease (AD), the most common cause of dementia, has become a growing public health concern due to an unprecedented increase in life expectancy globally.In the USA, reported deaths from AD have increased 146.2% between 2000 and 2018, making it the sixth leading cause of death [1].It is predicted that the annual cost of caring for AD patients will reach to a trillion dollars by 2050.AD is an irreversible and progressive disorder with neuropathological changes often occurring long before any symptom becomes apparent.The abnormal accumulation of amyloid-beta (Aβ) plaques, a hallmark of AD, is known to occur as early as two decades before the onset of clinical symptoms [2].Abnormal phosphorylation of tau, the second canonical AD protein aggregate, is believed to occur shortly thereafter (15-20 years before symptom onset) [3].While a great deal of research effort has focused on targeting pathological Aβ aggregates and tau neurofibrillary tangles, several drugs were approved by U.S. Food and Drug Administration (FDA), including Aduhelm ® [4] and Leqembi ® [5].These approved drugs could relieve symptoms while whether they can cure AD relies on further analyses.As a result, it is critical to identify novel biomarkers and biological pathways that may contribute to AD risk.
Physiological changes that take place outside the brain (e.g., immune, vascular, and metabolic changes) have been shown to directly influence the function of neural cells and relate strongly to risk of developing AD [6,7].The identification of circulating peripheral proteins that drive the associations between peripheral biological changes and increased risk for AD may enhance our understanding of AD pathogenesis and thereby inform future therapeutic strategies.In addition to Aβ and tau, a number of proteins have also been recognized to be related to AD [8].Translational and epidemiological research indicates that biological processes which operate outside of the central nervous system can contribute considerably to one's risk of developing AD [6,9].These peripheral biological processes can be reflected in plasma and serum protein composition, i.e., secreted proteins.Identifying proteins that are causally associated with AD-relevant outcomes will deepen our understanding regarding how peripheral molecular changes, biological pathways, and regulatory mechanisms influence AD risk.
AD is highly heritable.Twin and family studies support that genetic factors could play a role in at least 80% of AD cases [10].A recent genome-wide association study (GWAS) has identified 29 independent diseaseassociated risk loci by studying 71,880 (proxy) cases and 383,378 (proxy) controls of European ancestry [11].The present study aimed at identifying novel protein biomarkers for AD through evaluating the associations between genetically predicted protein concentrations and AD risk, a design of proteome-wide association study (PWAS).Similar to the design of Mendelian randomization (MR) and transcriptome-wide association study (TWAS) [12][13][14][15], such a design can potentially reduce common biases imbedded in conventional epidemiological studies, such as selection biases, residual confounding, or reverse causality.We established and validated comprehensive protein genetic prediction models to fully capture the genetically regulated components of protein levels by using both cis-and trans-acting elements, thus providing higher statistical power than only using cis-acting elements alone (a common practice for related studies).We then related genetically predicted plasma concentrations to AD risk and, in doing so, causally implicated 69 circulating proteins in the AD pathogenesis, shedding light on the peripheral biology of AD.

Methods
The genome and plasma proteome data of European descendants included in the INTERVAL study (subcohort 1 and subcohort 2) was used to establish and validate protein genetic prediction models.Detailed information about the INTERVAL study dataset has been described elsewhere [16].In brief, participants were aged 18-80 and were generally in good health.The SOMAscan assay was used to measure the relative concentrations of 3620 plasma proteins or protein complexes.Quality control (QC) was performed at the sample and SOMAmer level.After excluding eight non-human protein targets, a total of 3283 SOMAmers remained for further study.DNA was used to assay ~ 830,000 variants on the Affymetrix Axiom UK Biobank genotyping array.Standard sample and variant QC was conducted, as described in the original publication [16].SNPs were further phased using SHAPEIT3 and imputed using a combined 1000 Genomes Phase 3-UK10K reference panel via the Sanger Imputation Server, resulting in over 87 million imputed variants.Such SNPs were filtered using criteria of (1) imputation quality of at least 0.7, (2) minor allele frequency (MAF) of at least 5%, (3) Hardy-Weinberg equilibrium (HWE) p ≥ 5 × 10 −6 , (4) missing rates < 5%, and (5) presenting in the 1000 Genome Project data for European populations.In total, there were 4,662,360 variants passing these criteria.
In subcohort 1 (N = 2481), protein levels were log transformed and adjusted for age, sex, duration between blood draw and processing, and the first three principal components of ancestry.For the rank-inverse normalized residuals of each protein of interest, we followed the TWAS/FUSION framework [17] to develop genetic prediction models, using nearby SNPs (within 100 kb) of potentially associated SNPs as potential predictors.A false discovery rate (FDR) < 0.05 and P-value ≤ 5 × 10 −8 were used to determine potentially associated SNPs in cis-and trans-regions, respectively.We defined cisregion as a region within 1 Mb of the transcriptional start site (TSS) of the gene encoding the target protein of interest.Subsequently, we extracted all SNPs located within 100 kb of the aforementioned potentially associated SNPs to serve as potential predictors for establishing protein prediction models, excluding any ambiguous SNPs.In order to include potential predictors from both cis and trans regions, we converted all the chromosome numbers to Z and combined them as a single pseudo chromosome.Four methods, namely, best linear unbiased predictor, elastic net, LASSO, and top1, were used for establishing the models.For developed protein prediction models with prediction performance (R 2 ) of at least 0.01 [15,[18][19][20][21][22][23], which is a common threshold used in relevant studies, we further conducted external validation using subcohort 2 (N = 820) data.In brief, we generated predicted expression levels by applying the established protein prediction models to the genetic data, and then compared the predicted v.s.measured levels of each protein of interest.We selected proteins with a model prediction R 2 of ≥ 0.01 in subcohort 1 and a correlation coefficient of ≥ 0.1 in subcohort 2 for the downstream association analysis.
To assess the associations between genetically predicted circulating protein levels and AD risk, we applied the validated protein prediction models to the summary statistics from a large GWAS meta-analysis of AD risk [24].Instead of using the conventional approach of including clinically diagnosed AD alone, this GWAS combined clinically confirmed and parental diagnoses based by-proxy phenotypes, which has been demonstrated to confer great value in substantially increasing statistical power [25].In brief, this study included a total of 85,934 cases (39, The Copenhagen City Heart Study (CCHS), Bonn studies, and UK Biobank.Detailed information on study participants as well as genotyping and imputation methods for the samples from each of the included study can be found in the supplementary files of the original GWAS paper [24].Risk estimates for the single marker association analyses were adjusted for sex, batch (if applicable), age (if applicable), and top principal components (PCs).
The TWAS/FUSION framework was used to determine the protein-AD associations, by leveraging correlation information between SNPs included in the prediction models from the phase 3, 1000 Genomes Project data of European ancestry [17].We calculated the PWAS test statistic Z-score = w'Z/(w'Σ s,s w) 1/2 , where the Z is a vector of standardized effect sizes of SNPs for a given protein (Wald z-scores), w is a vector of prediction weights for the abundance feature of the protein being tested, and the Σ s,s is the LD matrix of the SNPs estimated from the 1000 Genomes Project as the LD reference panel.The Bonferroni correction P-value < 0.05 was used to determine significant associations between genetically predicted protein concentrations and AD risk.
Ingenuity Pathway Analysis (IPA, Ingenuity System Inc, USA)) and Protein-Protein Interaction analysis via STRING database (version 12.0) with 0.400 confidence level [26] was implemented to cluster and classify enriched pathways for the identified proteins using default interaction resources, including Textmining, Experiments, Databases, Co-expression, Neighborhood, Gene Fusion, and Co-occurrence.We also investigated potentially repositionable drugs targeting the genes encoding associated proteins, by using the GREP (Genome for REPositioning drugs) tool [27].We further conducted molecular docking analysis considering ATP1A1 protein as the drug target protein and almitrine and ciclopirox as the drug agents [28].

Results
In this study, potential predictors were identified for 1870 proteins, and protein prediction models were successfully established for 1864 proteins.For the 1413 of the remaining proteins, there was no SNP showing an association at FDR < 0.05 for cis SNPs and P-value ≤ 5 × 10 −8 for trans SNPs.After internal and external validation, there were 1389 proteins showing internal and external validation performance of R 2 ≥ 0.01.The median external validation R 2 was 0.06.There were 459, 189, and 38 proteins that showed external validation R 2 ≥ 0.1, 0.2, and 0.5, respectively.Overall, proteins that could be predicted well in INTERVAL subcohort 1 also tended to be predicted well in subcohort 2 in external validation analyses (a correlation coefficient of 0.96 for R 2 in two data sets; Fig. 1).Using the TWAS/FUSION framework, we examined the association for a total of 1340 proteins.For the remaining 49 proteins, more than half of the SNPs included in the models were not present in the AD GWAS summary; therefore, their associations with AD risk were not considered.We identified 69 proteins with genetically predicted concentrations showing associations with AD risk after Bonferroni correction (P-value < 3.01 × 10 −5 ) (Table 1; Fig. 2).Of those 69 proteins, positive associations were observed for 45 of them, and inverse associations were observed for 24 (Table 1; Fig. 2).
For those proteins associated with AD risk, the Core Analysis was performed in Ingenuity Pathway Analysis.Assembly of RNA Polymerase I Complex and DNA Double-Strand Break Repair by Non-Homologous End Joining were two canonical pathways showing significant enrichments at P < 0.05 (Table S2; Figure S1).In the Network Analysis, Cell-To-Cell Signaling and Interaction, Hematological System Development and Function, Immune Cell Trafficking was identified which involved 19 associated proteins (Table S3; Figure S2).Based on the Disease and Biological Functions analysis, the top disease functional categories identified were shown in Table S4.
Protein interactions of 69 associated proteins were investigated using the STRING database (Figure S3).In the network, five proteins (ILT-4, PRPC, SHPS1, Siglec-3, and Siglec-9) had three or more interactions with other proteins.Among them, Siglec-3 (known as CD33) was reported as a risk factor for AD and both the mRNA level and protein abundance were found to be increased in AD patients compared to the age-matched controls [29].This finding is consistent with our current study (Z-score = 4.47, P-value = 7.78 × 10 −6 ).
Based on The Anatomical Therapeutic Chemical (ATC) test using GREP, the drugs almitrine and ciclopirox targeting ATP1A1 were suggested to have a potential for being repositioned for AD treatment (odds ratio (OR) = 63.0;P = 0.022 for almitrine; OR = 35.9,P = 0.035 for ciclopirox).

Discussion
To our knowledge, the present study is the first large population-based study to systematically investigate the associations between genetically predicted circulating    S1 e NA indicates no risk SNP was reported on the chromosome protein concentrations in plasma and AD risk using genetic instruments of comprehensive protein prediction models.Overall, we identified 69 proteins that were significantly associated with AD risk after Bonferroni correction.If validated in future studies, our findings could add substantial new knowledge to the etiology of AD and provide a list of protein markers to facilitate precision preventive or therapeutic trials of AD.Recently, plasma proteins including Aß 42 and phosphorylated tau (p-tau217, p-tau181, and others) have been identified as promising plasma biomarkers for clinically and pathologically defined AD [32][33][34].While these biomarkers will be incredibly useful for participant risk stratification, it remains vitally important to identify additional AD biomarkers to further understand the pathophysiological processes leading to AD.By examining associations of genetically predicted protein levels in plasma with AD risk, we are able to go beyond a traditional examination of protein-AD association and begin to understand whether proteins may be causally relevant.For example, although plasma levels of YKL-40 [35] have been associated with AD, we did not observe evidence of an association for genetically predicted levels of YKL-40 (Z = 1.50;P = 0.13).This finding seems to support that although specific proteins such as YKL-40 could be strong biomarkers, they may not be causally relevant.

Num
We identified multiple AD-associated proteins using proteomic and genetic methods that were reported for the first time (Table 1).For some of them, there is already existing evidence from functional work supporting their potential links with AD.For example, cofilin-1, as a major actin depolymerizer in the central nervous system, plays a crucial role in maintaining the structure and proper function of neurons [36].Cofilin rods, which are primarily composed of actin and cofilin-1 and form in response to stressing conditions, have been suggested to be associated with neurodegenerative diseases such as AD by disrupting dendritic transportation and inducing synaptic dysfunction [36,37].Additional research is warranted to understand the identified associations for the other proteins.
By using GREP, the drugs almitrine and ciclopirox were suggested to be potentially repositionable for AD treatment.A double-blind controlled study involving patients with memory loss, lack of concentration, impaired mental alertness, and emotional instability supported that almitrine-raubasine could improve cognitive impairments [35].Another controlled multicenter study investigating patients with cognitive decline (assessed by MMSE, SCAG) again suggested almitrineraubasine significantly improved symptomatology compared with placebo [38].Three other trails conducted in China involving 206 patients with vascular dementia also supported significant beneficial effect of almitrineraubasine combination on the improvement of cognitive function measured by MMSE [39], although high risk of bias was observed.Other research supported that ciclopirox could protect neuronal cells from cell death and astrocytes from peroxynitrate toxicity [40,41].
Future work may be warranted to further investigate whether almitrine and ciclopirox can indeed treat AD.
The strengths of our study include a high statistical power to identify AD-associated proteins given the large sample size in the main association analysis.Instead of merely using individual protein quantitative trait loci (pQTL) as instruments, we developed comprehensive protein genetic prediction models using a state-of-theart method and externally validated their performance before applying them to downstream association tests.Our previous work has supported that compared with individual QTLs, comprehensive prediction models can better capture genetically regulated components of molecular levels and thus further increase statistical power [42].In two recently published studies, pQTLs in plasma were used to assess proteins potentially associated with AD risk [43,44].It is expected that the current work should have improved power as well as scope compared with these two existing studies.Particularly, in Walker et al. [44], only proteins showing an association for the directly measured levels were tested.In Yang et al. [43], a relatively smaller dataset (n = 636) was used to determine plasma pQTLs.Correspondingly, a smaller number of pQTLs for 127 proteins were identified for association analyses.In Wingo et al. [45], prediction models for 376 proteins in brain tissue were established, and 13 proteins were identified to be associated with AD risk.It is also worth noting that in the previous studies, AD GWAS summaries involving a less number of cases and controls were employed.Walker and Yang utilized the GWAS summary data from the Kunkle study [46], comprising 21,982 clinically diagnosed AD cases and 41,944 cognitively normal controls, while Wingo employed the AD GWAS summary data from the Jansen study [11], encompassing 71,880 cases (clinically diagnosed AD and AD-by-proxy) and 383,378 controls.In the present study, we utilized a more comprehensive GWAS summary data from a more recent study, including 85,934 cases (comprising 39,106 clinically diagnosed AD and 46,828 proxy AD) and 401,577 controls.We checked the associations of the proteins reported in these previous studies in the current work.Interestingly, only three of the reported proteins showed consistent associations (same effect direction and nominal P-value < 0.05) in the current work (Table S5).To further examine the robustness of these results, we extended our examination by using two independent protein genetic prediction models established by others using independent methods, namely Atherosclerosis Risk in Communities (ARIC) European ancestry models [47] and INTERVAL cis-models [48].Notably, when we focused only on plasma, a majority of the examined proteins did not exhibit significant associations with the risk of AD when using either ARIC European ancestry or INTERVAL cis-models.This observation that aligns well with results based on our developed models suggests that these prior findings could potentially be false positives.Again, such a discrepancy could be potentially attributed to the relatively limited utility of individual pQTL SNPs in fully elucidating the genetically regulated components of protein levels.Further studies are warranted to better characterize the other previously reported proteins.
Several limitations of the current work also need to be acknowledged.First, our findings may be subject to potential pleiotropic effects, limiting the ability to draw causal insights.Second, given the nature of our study of using genetic instruments to predict plasma protein levels, we are only able to capture the genetically regulated components of the protein concentrations, without incorporating the components influenced by exogenous exposures.Like the concept of transcriptome-wide association studies (TWAS), our proteome-wide association study (PWAS) aims to investigate the relationship between the genetically determined components of protein levels and disease risk.Further prospective studies with measured protein levels in pre-disease plasma samples are needed to better evaluate the relationship.Finally, when we establish genetic models to estimate such genetically determined components of protein levels, we carefully controlled for age, sex, duration between blood draw and processing, and top genetic principal components.However, we acknowledge that specific factors such as smoking and body mass index (BMI) were not controlled for during model construction using the INTERVAL dataset due to a lack of relevant data available to us [49].Future studies are in need to validate our findings.
In conclusion, in this large association study using genetic instruments, we identified multiple novel AD risk-associated proteins.If validated with further investigations, our study may add additional knowledge to the underlying mechanisms of AD.Health (National Institute on Aging).

Availability of data and materials
Summary statistics of the GWAS meta-analysis of AD risk by Bellenguez et al. are available at GWAS Catalog (https:// www.ebi.ac.uk/ gwas/) under accession no.GCST90027158.For the INTERVAL SomaLogic study, the individual-level genotype and protein data, and full summary association results from the genetic analysis, are available through the European Genotype Archive (accession number EGAS00001002555).Summary association results are

Fig. 1
Fig. 1 Performance of protein expression prediction models in INTERVAL subcohort1 and subcohort2 datasets for proteins showing internal and external validation performance of R 2 ≥ 0.01 net; blup: best linear unbiased predictor b SNPs within 1 MB of the protein-encoding gene c Z score represents the direction of the association between genetically predicted protein levels and AD risk d Risk SNPs identified in previous GWAS or fine-mapping studies.The SNP list is included in Table

Fig. 2 Fig. 3
Fig. 2 Associations Z scores for proteins showing an association at Bonferroni corrected P-value ≤ 0.05 with AD risk

Fig. 4
Fig. 4 The 3D structure (left) and 2D schematic diagram (right) of the ATP1A1 potential target and ciclopirox

Table 1
Proteins showing a significant association with Alzheimer's disease risk for their genetically predicted concentrations in plasma